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Abstract 



We address the task of higher-order derivative evaluation of computer programs that contain QR 
decompositions and real symmetric eigenvalue decompositions. The approach is a combination of uni- 
variate Taylor polynomial arithmetic and matrix calculus in the (combined) forward/reverse mode of 
Algorithmic Differentiation (AD). Explicit algorithms are derived and presented in an accessible form. 
The approach is illustrated via examples. 
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1 Introduction and related work 

This paper is concerned with the efficient evaluation of higher-order derivatives of functions F : M. N — > M M 
which are implemented as computer programs that contain numerical linear algebra functions like the QR 
or the real symmetric eigenvalue decomposition. 

Traditionally, Algorithmic Differentiation (AD) tools like ADOL-C ||GJM+99l or CppAD liBellOl regard 
the functions defined in the C header file math.h as elementary functions. In the forward mode of AD, their 
approach to compute higher-order derivatives is to generalize from real arithmetic to univariate Taylor poly- 
nomial (UTP) arithmetic I GJ M+991 iGUWOOl IGW08 1 . For the reverse mode of AD, the program evaluation 
is traced and stored in a computational graph or on a sequential tape. During the so-called reverse sweep the 
stored intermediate values are retrieved and used to compute derivatives (c.f. Section 0]). 
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2 APPLICATION EXAMPLES FOR THE PROPOSED ALGORITHMS 



As explained in Section[3j the functions in math.h suffice since all computable functions are a concatenation 
of these functions. However, working only at the expression level has also disadvantages since no global 
knowledge of the function's structure can be used. A particularly important class of algorithms in science 
and engineering are numerical linear algebra (NLA) functions. Though NLA functions are typically locally 
smooth, their implementations often contain non-differentiable operations and program branches. If no spe- 
cial care is taken, this may result in incorrect computations of derivatives. Also, many NLA functions on 
R NxN matrices require 0(N 3 ) arithmetic operations. Since during the reverse mode intermediate results are 
required, this would yield an &(N 3 ) memory requirement. Though it may be possible to adapt codes to 
yield reduced memory requirements, as for instance reported for the LU decomposition I Gri03L in practice 
it can be a cumbersome and error-prone process. Also one would like to reuse existing, high-performance 
implementations of NLA algorithms. Adding the NLA functions to the list of elementary functions circum- 
vents this problem. This has been realized before HBuc021 IBH961 and also UTP algorithms for some NLA 
functions (e.g. the solution of linear equations) have been implemented in software [Eri03]. 

The contribution of this paper is to provide explicit algorithms for UTP arithmetic applied to the QR decom- 
position and the real symmetric eigenvalue decomposition. Note that our approach to the real symmetric 
eigenvalue decomposition is similar to HAT981 |vdAMM07 ] but our algorithmic result differs. In addition, 
we also treat the reverse mode of AD. 

The document is structured as follows: In Section [2] we give two application examples for the algorithms 
presented in this document, followed by a brief review of the underlying computational model in Section [3] 
We shortly describe the basics of AD in Section [4] where we make use of the results from [3] In Section [5] 
we describe the general approach of NLA functions. After that, we apply the results from Section 0] to find 
extended functions for the QR and eigenvalue decomposition in Section [6] and [7] and also provide pullback 
algorithms that are necessary in in the reverse mode of AD. Finally, we present some numerical results in 
Section [U 

2 Application examples for the proposed algorithms 

The purpose of this section is to show two application examples where higher-order derivatives of computer 
programs that contain the QR and the real symmetric eigenvalue decomposition are necessary. 

2.1 Optimum experimental design 

The goal in optimum experimental design (OED) is to minimize some cost function representing the size of 
the confidence region of parameters of interest. We consider here a popular formulation where the objective 
function <&(C) G R depends on the covariance matrix C G R n p xN p of a constrained parameter estimation 
problem, where the covariance matrix is computed by 



and we assume that Ji = Ji(q) G R n " xN p, J 2 = J 2 {q) G R n * xN p, I G R n p xN p and q G R N ". The notation is 
motivated as follows: p G R Np are model (pseudo-)parameters, q G R Nq are control variables and J\ and J 2 are 



C 




(2.1) 
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2.2 Index determination of differential algebraic equations 



Jacobians of the residuals resp. of the constraint function with respect to the parameters p. Typical choices 
for cost function <1> are the trace, the determinant or the maximum eigenvalue of the covariance matrix C. 
Though Eqn. d2.il ) correctly describes the covariance matrix C, the actual algorithmic implementation is 
often a code like 

C = QliQiJlhQlY'Qi, (2.2) 

where Q2 results from a QR-like decomposition of J2, i.e. j\ = {Q\, <2j)(L,0) r . The matrices J\ and J2 
are assumed to satisfy the constraint qualification rank (J2)) = N r and the condition rank (J) = N p , where 
J = (Jjjl) 7 . The matrix Q 2 spans the nullspace of J2. For a detailed discussion we refer to Korkel IIKor02ll 
and to Bock and Kostina HKKSB07H . 

Newton-type optimization algorithms require at least the gradient V q <i>(q) of the objective function <I>. To 
obtain good convergence near the local minimizer, it is often advantageous if exact second-order derivatives 
are available. Since the number of controls N q can be large, one would like to have the possibility to compute 
these derivatives in the reverse mode of AD. Robust objective functions are often formulated in a way that 
require third and even higher-order derivatives, so it is necessary to have algorithms that scale easily to 
arbitrary order. 

Thus, this example requires the differentiation of the nullspace of a matrix, the matrix product, matrix 
inversion, the QR decomposition and the objective function evaluation, e.g. the eigenvalue decomposition. 

2.2 Index determination of differential algebraic equations 

Many dynamical problems in chemical engineering, rigid body mechanics, circuit simulation and control 
theory are described by Differential Algebraic Equations (DAEs) of the form 

= / (&d(y,x),y,x) , x e / = [a,b] C M, (2.3) 

where y : R -)■ R m lives in suitable function space, / : R" x R m X /->• R m , d : R m x / -)• R" are sufficiently 
smooth and typically n is smaller than m. 

Using higher-order derivatives of the functions in the DAE one can, in general, transform the DAE system 
into an ODE system of order one. The differentiation index is the highest derivative order required in this 
process, that is, derivatives of up to this order of the original equations are part of any solution of the DAE. 
The knowledge of the index allows to estimate the difficulty to solve the DAE. 

There are many different index definitions. Here we consider the tractability index. To compute it, the DAE 
is linearized along a given function y(x) as 




with w(x) = 4-d(y(x),x). The coefficient functions A =A(x), D = D(x) and B = B(x) give rise to a matrix 
sequence 

Go = AD, B = B, 

G m = Gi + BtQi, B i+ i = BiPt - G i+1 D £ (DP Q ■ ■ ■ P i+l D )DP Q ■ ■ ■ Pi, (2.4) 
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3 COMPUTATIONAL MODEL 



where <2, describes a projector onto ker G,, P, ■ = I — Q\ and D is a generalized reflexive inverse of D. Now, 
the tractability index is the smallest number fx G PJ where is nonsingular. The projectors <2; can be 
determined mainly by use of a QR decomposition. 

A QR decomposition of the potentially singular matrix G G ]j MxM with rank (7 = r results in 

where 77 describes a column permutation matrix, Q an orthogonal matrix and R\ G W xr an upper triangular 
matrix. Then a nullspace projector Qq onto kerG is given by 

The computation of via (12.41) needs the differentiation of DPo • • • Pi + \D with respect to x. Thus, higher- 
order derivatives of a function that contains the QR decomposition are necessary. For a in-depth discussion 
of index definition of DAEs see Marz HMar021lMar03ll . 

3 Computational model 

We consider functions 

F:R N -> R M 

X H-» y = F (x) , 

that can be described by the three-part form 

V n -N = *n H = l,...,N 

Vl = (f>i(v H i) l = l,...,L 
jM-m = V L - m m = M-l,...,0, 

where 0/ G {+,—,-,/, sin, exp, ... } are called elementary functions, v/ are intermediate values and v,^/ 
denote the tuples of input arguments of 0/. For instance the function F : M? — > R, x h >• y = F(x) = 
sin(xi + cos(x 2 ) *xi ) is described by 



independent 


V-l = 


= Xl 


= 3 


independent 


V 


= x 2 


= 7 




Vl 


= 0i (vo) 


= cos(v ) 




v 2 


= 02(vi,V_i) 


= ViV_! 




v 3 


= <Mv-i,v 2 ) 


= V_i+V 2 




v 4 


= 04 (V3) 


= sin(v3) 


dependent 


J 


= V 4 





It shows a sequential representation of the computation. Alternatively, one can describe the function evalu- 
ation as composite function 

F(x) = / J y o$ L o$ L _ 1 o...o$ 1 oPj(x), (3.1) 
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where <J>/ : M 1 — > Jtf, s^ 1 ! ) h >• jrW = <I>/(j(' ^) are called elementary transitions that operate the stafe s/jace 
J^ 7 . Each elementary transition can be written as 

Si = fl o ft o g/ + (I - (1 - Oi)P,oP[) . (3.2) 

where the functions 0/ : ^/ C J;- ;> J£? e {+, —,*,/, sin, exp, . . . } are the elementary functions. The Q\ : 
— >■ map to the domains of the elementary functions and the P\ : M\ — > Jf? map back to the overall state 
space. The functions Pf and P y are used to map the independent variables x to the state s^ 3 ' and s^ to y. 
The case a/ = 1 corresponds to an augmented assignment si = si + <pi (si ) and Oi = to the usual assignment 
57 = (j)i(si). For our purposes it suffices to consider a real vector space as state space, i.e., the mappings P/ 
and Qi can be written as matrices. For a more detailed discussion see Griewank [Gri03|. 



4 Algorithmic differentiation 

In this section we briefly review some key results from the theory of AD that will be necessary in Section 
[6] and |7J For a detailed discussion we refer to the standard reference "Evaluating Derivatives" by Griewank 
and Walther HGW08L 



4.1 The forward mode 

One can use univariate Taylor series expansions to compute higher-order (directional) derivatives. The basic 
observation is that given a smooth curve x(t) = xq +x\t with t G (—£,£), £ > 0, and a smooth function F 
one obtains a smooth curve y(t) = F(x(t)) with the Taylor series expansion 



y(t) = D fy d t d +0(t D ) = °f ^F(x(t)) 

d=Q d=0 a ■ al 



t d + 0(t D ) . (4.1) 



t=0 



By application of the chain rule one can interpet the terms of the expansion. The zeroth derivative is the nor- 
mal function evaluation yo =F(xq) and the first coefficient y\ = ^F(x(t)) | „ = -^F(xq) -x\ is a directional 
derivative. 

In the context of AD it is advantageous to generalize the notion of Taylor series expansions to a purely alge- 
braic task. In other words, for arithmetic with univariate Taylor polynomials (UTP) one extends functions 
F : R N R N to functions E D (F) : R N [T]/(T D ) R M [T]/(T D ). We denote representing elements of the 
polynomial factor ring R [T]/ (T D ) as 



D-1 

[x} D := [x]_,...,xd-i] ■= Xdjd > ( 4 - 2 ) 

d=0 



where Xd G M. N is called Taylor coefficient. The quantity T is an indeterminate, i.e., a formal variable. The 
extended function Ed{F) is defined by its action 



ty] D = E D (F)([x] D )Jfy d T d = Y^FC£x d t d ) 

d=0 d=0 " • Uf d=Q 



T d . (4.3) 

(=0 



4 ALGORITHMIC DIFFERENTIATION 



The notation [x]d = [x] : d-i and [jc]<m-i : d_i = Hrf+i: = fe+i > ■ ■ ■ >*d-i] Wli l be useful later on. One can show 
that this definition is compatible with the usual polynomial addition and multiplication. Furthermore, any 
composite function F (x) = (HoG)(x) =H(G(x)) satisfies 

E D (F) = E D (H)oE D (G). (4.4) 

I.e., the extension function E D is a homomorphism which preserves function composition. An immediate 
consequence is that it is necessary to find algorithms only for the very limited set of elementary functions 
<p G {+,—,*,/, sin , cos , exp ,...}. Explicitly, one performs the program transformation Ejj(F) = Ed{^l) ° 
• • • o Ed(&i )(Wd)- We call the action of computing [y]o = Ed(F)([x]d), i.e., the resolution of the symbolic 
dependence to obtain the numerical value \y]p, the pushforward of the function Ed{F). 

Many functions are implicitly defined by equations of the type = F(x,y) G M M , where x £ M. N are the 
inputs and y G M M the outputs. The idea is to demand that the defining equations of order D 

0=E D (F)([x] D ,[y] D ) (4.5) 

should be satisfied. By = it is meant that [jc]=[y] if xj = y^ for d = 0, . . . ,D — 1. This is also often 
written either as [x] = \y] + G(J D ) or [x] = [y] mod T D . The defining equations lead directly to an al- 
gorithmic approach to compute \y\o, the so-called Newton-Hensel lifting. In the literature it is often also 
just called Hensel-lifting or Newton's method HGW08I1 . Assuming \y\D is already known and satisfies 
0=Eo(F)([x], \)>]d), one can lift the computation to ahigher degree. Explicitly, one tries to solve D = E Ed+e(F)([x], \y]o+E)- 
Splitting [y]o+£ = [y]o + [ky\ET D and performing a first order Taylor expansion of F about [y]o yields after 
a short calculation 

Me = -[F,] E l [AF] E , (4.6) 

where Ed+e(F)([x], \y]D) D = E [AF]eT d and [F } ]e '■= Ee(^)([x], Me)- Setting E = D means that at each step 
the number of correct coefficients is doubled. In this case we call it Newton 's method. In the case E = 1 only 
the next coefficient is computed. We call the special case E = 1 sequential Hensel lifting which is also the 
formula that is often given as part of the implicit function theorem. The difference is that Newton-Hensel 
lifting is a purely algebraic task. For a discussion on how to obtain asymptotically fast algorithms and for 
the nomenclature see e.g. Bernstein [BerOl , Ber08 ]. 

4.2 The reverse mode 

The basic idea of the reverse mode of AD is to pullback linear forms a to obtain an explict mapping y\-tx. 
I.e., given F : R N -> R M , y = F(x) one has 



cc{y,y) = 




(4.7) 



where x n = Y^t=\ynr§f- ■ For notational reasons one uses Y!^ =l x n dx n = x T dx. We call the action of going 
back one level of the symbolic dependence the pullback of the linear form a(y,y). For a more detailed 
discussion on calculations with differentials see Magnus and Neudecker IIMN99I . 
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It is also possible to compute higher-order derivatives by combining UTP arithmetic and the reverse mode 
of AD. For that, the UTP algorithms are interpreted as functions mapping D coefficients Xd, < d < D to D 
coefficients y^, < d < D, i.e., a mapping from ~M, NxD — > M. MxD with a special structure. One can formally 
define a linear form by 

E D (a)([y] D ,[y} D ):=[y} T D d[y] D . (4.8) 

Here, d[y]o = Yld=o&ydT d is a formal polynomial where each coefficient is a differential and [j]Jd[y]£> = 
Lm=i [.ym]z)d[y m ]/) computes the polynomial multiplication of formal polynomials. One can show that 

E D (a)(\y]D,[y]D) = MdEd (3- )([x]D)d[x] D =[x n ] T D d[x n ] D = E D a )([x] d ,[x]d) (4.9) 

ax 

holds MChr911 . One can interpret this result as follows: If \y] D = w GR M then [x] D = E D (w T §£)(Md)- 
Setting w = e, a Cartesian basis vector would yield the Taylor expansion of the fth row of the Jacobian. 
The interpretation of the Taylor coefficients as derivatives yields higher-order derivatives. If M = 1 and 
w = 1 one obtains the Taylor expansion of the gradient [x]d = Ed(VF)([x]d)- E.g., propagating the UTP 
[x]2 = Xq +X\T would yield [x\2 = Xq + x\T where x$ = V X F (x) and x\ = V^F (x) ■ x\ , i.e., a Hessian- vector 
product. 



5 Defining equations of numerical linear algebra functions 

As briefly mentioned in the introduction, Numerical Linear Algebra (NLA) functions can be viewed as 
algorithms representing a concatenation of functions like +,—,*,/, sin, cos, .. . and thus it is possible to 
apply the AD techniques described in the previous section directly to the algorithm. However, there is also 
the possibility to regard the problem from a more abstract point of view. Many NLA functions are implicitly 
defined by a system of equations. 

For instance the QR decomposition is defined by the defining equations 

= QR-A (5.1) 

= Q T Q-l (5.2) 

= P L oR, (5.3) 

where A,R £ R MxA ' with M > N and Q £ M. MxM . The functional dependence of the defining equations is 
denoted 

Q,R = qr(A). (5.4) 

Only the first rows R-n : £ M. NxN of R are nonzero. For convenience reasons we use the slicing notation 
i: j = (i,i+\,...J). 

The defining equations of the symmetric eigenvalue decomposition are given by 

= Q T AQ-A (5.5) 
= Q T Q-l (5.6) 
= (P L + P R )oA, (5.7) 

where A £ R MxM is symmetric. The functional dependence is denoted A, Q = eigh (A). We call the matrices 
{Ph)ij = 5/<i and (Pr):] = 5,-<y skeletal projectors since their elementwise product with a matrix returns 
strictly lower resp. strictly upper triangular matrices. 



6 THE QR DECOMPOSITION 



6 The QR decomposition 



Before we derive algorithms based on the defining equations, we briefly investigate what can go wrong if a 
typical implementation of the QR decomposition using Householder reflections is evaluated in UTP arith- 
metic. Consider Algorithm 5.1.1 from the book "Matrix Computations" by Golub and Van Loan [GVL96] 
which we adapted to our notation in Algorithm Q] From the AD point of view, the problematic part in the 
code is the check a = 0. Since a paradigm of AD tools is that the control flow must remain unchanged, 
the check a = only considers the zeroth coefficient xq of a UTP. Hence, if [x]2 = e\+x\T is given as 
input and x\ / 0, the algorithm will simply evaluate j3 = and return. As final result, one obtains a matrix 
[R] 2 where R { is not upper triangular. The LAPACK implementation (LAPACK-3.2.2) of DGEQRFP.f calls 
the subroutine DLARFGP.f which contains a similar check. Hence, automatic augmentation based on AD 
principles can go wrong in such cases. 



As a side remark, note that additionally the function realized by this algorithm has a pole at a = 0, producing 
numerical overflow for a « 0. 



input :xeir 

output: vGl" with vi = 1 

output: j3 € E 

a = x\.x2- 

HI) 

if o = then 

I £=0 
else 

H = ^x\ + o 
if x\ < then 

| Vi=Xl-}i 

else 

| Vi =-<t/(xi+m) 
end 

P=2v 2 l /(o + v 2 1 ) 
v = v/v\ 

end 

Algorithm 1: Householder Vector. The reflector is P v = I — j8vv T , with vi = 1. 
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6. 1 Pushforward in Taylor arithmetic 



6.1 Pushforward in Taylor arithmetic 

We now come to the higher-level approach that is based on the defining equations given in Section [5] To 
compute [Q]d, [R]d = ^d(^)([A]d) one can apply Newton-Hensel lifting to solve 

= [Q]d[R]d-[A]d (6.1) 
= [Q] T d [Q]d-1 (6.2) 
= P L o[R] D . (6.3) 

The direct application of Eqn. (14.61 ) should be avoided since F y is sparse and has a lot of structure. Rather, 
one assumes that one has already computed [Q]d and [R]d and computes the next 1 < E < D coefficients 
by performing a first order Taylor expansion [Q]d+e = [Q]d + [AQ}eT d and [R}d+e = [R]d + [AR]eT d and 
tries to solve for the yet unknown [AR]e and [A<2]e. As result one obtains Proposition [T] For convenience, 
we use the convention that Rdjj is the j'th row and j'th column of the <i'th coefficient of [R]d- 
Proposition 1. Let [A] D+E G R MxN [T]/(T D+E ) with M >N and 1 <E < D, [R] D G R MxN [T]/(T D ) where 
[R-.n,:]d i s upper triangular with nonsingular Rq-.n.-. and [Q]d £ M> MxM [T]/(T D ) orthogonal be given and 
satisfy the defining equations of order D. Then [A/?jv,:]e = [R-.n,:]d-.d+e-i an d [AQ]e = [Q]d-.d+e-i ar ? 
given by 

-[Q]d[R]d + [A]d+e (6-4) 
-[Q] T d [Q]d + I (6.5) 
~[AG] E (6.6) 

P L o ([Q] T E [AF} E [R.. N ,] E Y ) -P L o[S;, N ] E (6.7) 
[Q) T E [AF] E -([S} E + [X} E )[R} E (6.8) 
[Q]e([S]e+[X] e ) , (6.9) 

where P L G R MxN with (P L ), 7 = £,■<;. 

Proof In the Appendix I A. lTTl □ 



[AF] E T D 


D+E 


[AG] E T D 


D+E 


[S}e 


E 


p l o(\x w M 


E 


[AR]e 


E 


[AQ]e 


E 



6.2 Pullback 

Proposition 2. Let A,R,R G R MxN resp. Q,£>£ R MxM be given and it holds M >N, rank (A) =N,Q,R = 
qr (A). Then A G IR MxA ' can be computed by 

A = Q(R+{P L o(RR T -RR t + Q t Q-Q t Q))R +t ) . (6.10) 

Here, R + denotes the Moore-Penrose pseudoinverse ofR. That means it satisfies RR + R = R and since R has 
full column rank also R + R = I. 

Proof. In Appendix IA. 1.21 □ 



9 



6 THE QR DECOMPOSITION 



6.3 Explicit algorithms 



One can use Proposition [TJto derive an explicit algorithm as shown in Algorithm |2l where at each step E = 1 
is used. 



input : [A] D = [A , . . . ,A D -i], where A d G R MxN , d = 0, . .. ,D - 1 and rank (A ) =N,M>N. 
output: [<2]d = [<2o> • • • i Gd-i] matrix with orthonormal column vectors, where Q d € M. MxN , 
d = 0,...,D-l 

output: [R] D = [R Q , . . . ,R D -i] upper triangular, where R d £ R NxN ,d = 0,...,D — l 
Qo,^o = qr(A ) 

For d = 1 to D - 1 do 

AF=A d -Z d k -lQ d - k R k 

s = - \li=\Q T d-kQk 

A',:V = /'/. O (QqAFRq} n . n - .V, :V ) 
-^:JV+1: = 

/?d = ejAF-(5+X)i? 
<2j = £o(s+x) 

end 

Algorithm 2: Sequential Hensel lifting for the QR decomposition. 



The pullback can be computed in Taylor arithmetic. In the global derivative accumulation it is necessary to 
update the value of [A]d- This happens if [A]d is input of more than one function. The algorithm for the 
pullback takes this into consideration. 



input 


[Md = 


[Aq,.. 


■ Ad- 


i], where Arf€R MxJV , d = 


0,.. 


,£>- 


l,M>N 


input 


[Q]d = 


[Go,- 


■-,Qd 


where Q d 6 M. MxM , d 


= 0, 


..,D 


- 1 


input 


[R]d = 


[Ro,~ 


■ ,Rd 


i], where R d e R MxN , d = 


= 0,.. 


.,D- 


1 


input/output 


[A]d = 


[A ,.. 


■ Ad- 


i], where A d eR MxN ,d = 


0,.. 


,D- 


l,M>N 


input 


[Q)d = 


[Go,- 


■ ,Qd 


_i], where &eR MxM ,rf 


= 0, 


..,D 


-1 


input 


[R]d = 


[So,.. 


■ ,Rd 


i], where e R MxJV , d = 


= 0,.. 


.,D- 


1 



[A]d = [A]d + [Q]d- 

■ m» + fa (MdMS - fflaMS + - [g]5[g]d)) [R]£ r ) 



Algorithm 3: Pullback of the QR decomposition in Taylor arithmetic. The inputs [A]d, [Q)d, [R]d must 
satisfy the defining equations. 
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7 The real symmetric eigenvalue decomposition 



The problem of rinding eigenvalues and eigenvectors arises in a wide variety of practical applications. As 
for the QR decomposition, we want to have algorithms that compute the real symmetric eigenvalue decom- 
position in UTP arithmetic as well as pullback algorithms. The symmetric eigenvalue decomposition is also 
important since the Singular Value Decomposition (SVD) of real matrices is closely related to it. More ex- 
plicitly, one can compute the SVD of a matrix A G M MxAr of rank r., i.e., A = ULV T , where £ = diag(Ei,0), 
U = (U h U 2 ), Ui G R Mxr , V = (Vi,V 2 ), Vi G R Nxr as 



C 



A 
A T 





where 



J_(U { Uy V2U 2 
V2\Vi -Vi V2V 2 



is orthogonal |Bjo96|. 



7.1 Pushforward in Taylor arithmetic 

Given the symmetric polynomial matrix [A]d G M. NxN [T]/(T d ). The eigenvalue decomposition is the solu- 
tion [A] D , [Q]d G M NxN [T]/(T d ) of the implicit system 

= [Q] T D [A] D [Q] D -[A} D (7.1) 
= [Q] T d [Q]d-1 (7.2) 
= (Pl + Pr) °{A] d , (7.3) 

which is called the defining equations of order D. We also assume that the eigenvalues are sorted as [Ah]b < 
[A22]d < • • • < [Ann]d- The functional dependence is denoted 



[A]dMd =eigh([A] fl ). (7.4) 

Let A,Q = eigh(A) be the usual symmetric eigenvalue decomposition. We denote the diagonal of [A]d as 
[X\d = diag([A]o). If eigenvalues are repeated, i.e., multiple, the eigenvectors generalize to eigenspaces and 
the columns of Q, that are associated to such a multiple eigenvalue, are not unique. Rather, any orthonormal 
basis could be the result. This has consequences for the Hensel-Newton lifting approach, because given 
[Q]d and [R}o that satisfy the defining equations of order D it is generally not possible to find a [AQ]e and 
[AR] E such that [Q] D+E = [Q]d + [AQ] E T D and [R] d +e = [R]d + [AR] E T D satisfy the defining equations of 
order D + E. The higher-order coefficients [AA} E enforce additional conditions on the chosen basis of the 

D-\-E 

eigenspaces. A wrong choice of [Q\d means that = (Pl + Pr) ° [A]d+e cannot be satisfied. However, 
0=P^ o [A]£ + £ can be satisfied. The matrix P® is a skeletal projector with zero blocks on the main diagonal 
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whose size corresponds to the multiplicity of an eigenvalue [A]o and all other entries are ones. The multi- 
plicity m rf ([A/]z)) of an eigenvalue of level d is defined to be the number of i G N s.t. = Md- 
I.e., 



diag([A]rf) 



m d ([X l ] D ) times 



m rf ([A wd ] D ) times 



where A^ is the number of different eigenvalues at level d. We define b d 6 N A 'b+ 1 to be a vector satisfying 
md ([^n b ]D) = b„ b+l — b d b . The symbol b is used because it relates to blocks in the matrix. The elements of 

satisfy {Pjj)ij = 1 — L„ b b = i 8^ fa <j<b d ,• This notation is a little cumbersome but turns out to 

be helpful. One defines b° = [0,Af + 1]. The vector b l represents the multiplicities of the usual symmetric 
eigenvalue decomposition. E.g., for Af = 3 and b d = [1,3,4] one has 




We reformulate the overall problem as a sequence of subproblems. We call the implicit system 










d_ 
D 



[Q%[A] D [Q d ] D -[A% 
[Q d ] T D [Q d ] D -l 

(P L + P R )o[A]d 



(7.5) 
(7.6) 
(7.7) 
(7.8) 



the relaxed problem of level d and order D. I.e., it is assumed that up to order d the original problem is 
solved but only block diagonalized for the higher order coefficients. 

To give an illustrative example consider this relaxed problem of order 3 and level 2. At this point of the 
algorithm, one has potentially obtained a matrix polynomial [A] 3 = Y*d=oAdT d with coefficients of the form 



/I 



Ao 



V 



/ 2 



Ai 



3/ 



V 



A 2 



V 



(\ 3 

3 5 



V 



1 

2 



V 



I.e., Ao and Ai are already diagonal. Since there are two eigenvalues with multiplicity m 2 {[X\i,) =2 it 
follows that A2 is only block diagonal. Note that the eigenvalues are not globally sorted by value in the 
higher coefficients but only in the subblocks defined by the lower order coefficients. In this example, the 
repeated eigenvalues in the first block split at the lift from d = to d = 1. The blocks are defined by 
b l = [1,4,6,7] and b 2 = [1,3,4,6,7]. The blocks in A 2 are defined by b 2 . 



The function that solves the relaxed problem of order D and level d is denoted 

[A d ] D ,[Q d b = eigh^([A] D ). 



(7.9) 



12 



7. 1 Pushforward in Taylor arithmetic 



The idea is to implement an algorithm that successively increases d by one. For convenience we define 

[Q°] D := I and [A°) D := [A} D . 

Theorem 3. Let [A]d be given. Then the solution of 

[Q d+1 ] D ,[A d+1 ] D ^ d+l ([A]D) (7.10) 

can be computed from the solution [Q d ]o, [A rf ]D== e igh^([A]o) by computing 

[A s , s ] D - d ,[Q s , s ] D - d °= eighjQA^), (7.11) 

where s = b d b : b d +1 — 1 are slice indices and n^ = 1, . . . ,N d . All other elements of [Qlo-d an d [A\r>~d ar e 
zero. I.e., [Q\D-d and [A]i>- d are block diagonal. It holds that 

[A d+1 ] D = [A d ] d + [A} D _ d T d (7.12) 
[Q d+l b = [Q c1 ]d[Q}d, (7.13) 
where [Q]d = [Q}D-d + [AQ]dT D ~ d for some [AQ]o-d that satisfies 

0=[Q] T d [Q]d-1. (7.14) 

Proof. We need to show that [A d+l ]j}, [Q d+i ]D is a solution to the relaxed equations of level d + 1 and order 
D. From the definition of eigh j it follows that = (P l + Pr) ° [A d+l ] d+ i and = P d+l o [A d+l ] D is satisfied. 

We also know that 0=[Q d+ %[Q d+i ] D - l=[Q] T D [Q d } T D [Q d ] D [Q] D - I is fulfilled because 0=[Q d } T D [Q d ] D - I 
and 0=[<2]o[(2]o — I. Hence, it only remains to show that the third defining equation is satisfied which is 
shown by the following straight-forward calculation: 

[Q] T D [Q%[A} D [Q d ] D [Q] D -[A d+l ] D 
[Q] T D [A d ] D [Q] D -[A d+l ] D 
[Q] T D ([A d } d + [A d ] d .T d )[Q] D - [A d ] d - [A] D _ d T d 
[Q] T D [A d ] d [Q} D + [Q] T D [A d ] d: [Q] D T d - [A% - [A] D _ d T d 
[A d ]d[Q} T D [Q]D + [Q] T D - d [A d U [Q] D - d T d - [A d ] d - [A] D - d T d 

mi- d [A d umD- d T d -[A) D . d T d 

[Q] T D-d[A d ]d:[Q]D-d-[A] D - d . 

In the fifth line it has been used that the diagonalization has only to be performed for block diagonal matrices. 
If the eigenvalues are already distinct there is nothing to diagonalize and the step can be skipped. It also 
means that one may interchange [A d ] d with [Q]d- □ 

The following proposition gives us the means to diagonalize a matrix in the zeroth degree and block diago- 
nalize w.r.t. the blocks defined by the repeated eigenvalues. I.e., it gives the justification that the solution of 
Eqn. (17.111 ) can be found. In the case of distinct eigenvalues the application of this algorithm already solves 
the original problem. 



o £ 

D 
D 
D 
D 
D 
D-d 
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Proposition 4. Let [A] D+E = [A} D + [AA] E T D be given and [A d ]j), [Q?\d be a solution of the relaxed problem 
of level d = 1 and order D. Then it exist [AA d ]E and [AQ d ]E such that [A d ]u+E = W\d + [AA c1 }eT d and 
[A<2'']d+£ = [AQ d ]o + [AQ d \ET D are a solution of the relaxed problem of level d = 1 and order D + E. A 
closed form solution is 

[AA d ] E = H°[ k \e (7-15) 
[AQ d ] E = [Q d ]E([AG] E + P d o([K} E /[E} E )) (7.16) 



where[AF\ E T DD ^\Q d ]l[A\ D [Q d ] D -[K d \ D and[AG] E T DD ^ E - [K] E l[AF] E + ([A] E [AG] E - 

[AG] E [A] E ) + [2 rf ]£[AA] £ [g rf ]f: and [Ejj} E =[A d j] E — [A d ] E . The expression [K] E /[E] E denotes an element- 
wise division. P d is a matrix with only ones on the diagonal blocks defined by the multiplicity of eigenvalues 

'fj is defined s.t. P d + P d 
then Pf] is the identity matrix I. 



in Aq. Pi is defined s.t. P d +P d is a matrix full of ones. One can see here that if the eigenvalues are distinct, 



Proof. We set Q d = Q etc. for notational simplicity. Applying Newton-Hensel lifting to the defining equa- 
tions yields 

°± £ {[Q]d + [AQ]eT d ) t {[Q] d + [AQ] e T d )-\ 
= -2[AG] E + [AQ] T E [Q] E + [Q} T E [AQ} E 
-2[AG] E + 2[S] E , 

([Q]d + [AQ} e T d ) t ([A} d + [AA} e T d )([Q} d + [AQ] e T d ) - ([A] D + [AA] E T D ) 
[AF] E + [Q] T E [AA} E [Q] E + [AQ] T E [Q] E [A} E + [A]e[Q] t e [AQ} e - [AA] E 
[K} E + [X]e[A] e - [A]e[X] e - [AA] E 

[K] e + [E] e o[X]e-[AA]e. (7.17) 



D+E 
E_ 
E_ 
E 



Thus [AA} E =P d h o[K} E and [X] T E =P d o ([K} E /[E} E ). Above, [AQ] T E [Q} E =[S] E + [X] E , [S] E symmetric and 
[X]e antisymmetric (Lemma IT5T> has been used. □ 

It remains to show that Eqn. (17.141) can be satisfied. 

Lemma 5. Let [Q]d be given and it satisfies the defining equation 0=[2]J[2]o — I. Then the solution can be 

lifted to D + E with E <D. I.e., it is possible to find [Q]d+e ■= [Q]d + [AQ] E T D s.t. 0°= ' [Q\1 +e [Q}d+e ~ I- 
A closed form solution for [AQ]e is given by 

[AQ] E = [Q]e[S}e, (7.18) 

where[S] E T DD ^ E -^([Q} T D [Q} D -l). 

Proof. In Appendix IA.1.31 □ 
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7.2 Pullback 



7.2 Pullback 



The eigenvalue decomposition is non-differentiable at points where eigenvalues are repeated and hence the 
defining equations do not define a well behaved implicit mapping as described by Christianson [Chr98|. 
However, the eigenvalue decomposition is typically used within a global context where the non-uniqueness 
and non-differentiability can be worked around. Here, we give only the pullback algorithm that is correct 
for unique eigenvalues. 

Proposition 6 (Pullback of the Symmetric Eigenvalue Decomposition with Distinct Eigenvalues:). Given 
A, <2, A, Q, A, where all eigenvalues are distinct, one can compute A by 



Hij = [Xj-XiY 1 if ijtj, else (7.19) 
A = Q{A + Ho(Q T Q))Q T (7.20) 



Proof. In Appendix IA. 1.41 □ 



7.3 Explicit algorithms 



input : [Q] d = [0o, . • . , Q d -i] with = [Q] T d [Q]d - I 
input : D € N 

output: [Q] D = [Qo, ■ • ■ ,Qd-i), where = [Q] T D [Q] D - I 

for k = d to D — 1 do 

| Qk = -5 0o it/ Qf Qk-i 
end 

Algorithm 4: This algorithm computes [Q]d = qhft([0]j,D) as described in Proposition |5] using sequential 
Hensel-lifting (E = 1). 
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input : [A]d = [Ao, . . . ,Ad-i], where A d G M. NxN symmetric positive definite, d = 0, . . . ,D — 1 
output: [A] D = [Ao, . . . ,A D -i], where A G R NxN diagonal and A d G R NxN block diagonal 

d=l,...,D-\. 
output: [G]d = [Go, • • • , Gd-i] orthogonal, where Q d G R NxN 

output: b£N Nb+l , array of integers defining the blocks. The integer Nb is the number of blocks. Each 

block has the size of the multiplicity of an eigenvalue X nb of Ao s.t. for s = b„ b : b nji+ \ — 1 one has 
(Go;:,j) T AoGo;:,5 = 

A-o,Go = eigh(A ) 

compute b G 

Ejj = Ao- jj — Aq ; u 
H = P b o{\/E) 

For d = 1 to D - 1 do 

£ = AF + QlA d Qo + 5Ao + A S 
Qd = Qo(S + HoK) 
A d = P h oK 

end 

Algorithm 5: This algorithm computes [A]d, [G]d>^ = ei ghi([^]z)) as specified by |4] using sequential 
Hensel-lifting (E = 1). I.e., the zeroth coefficient is diagonalized and the higher order coefficients are block 
diagonalized. The symbol i G Nq denotes a multi-index, i.e., the summation L|;|=rf 8 oes over ai l possible i 
such that |z| = Y,k=\ ik = d. 



input : [A] D = [Ao, . . . ,Ad-i] symmetric with A d G R NxN 

output: [A] D = [A , . . . ,A D -i], where A d G R NxN diagonal for d = 0, ... ,D - 1. 

output: [G]d = [Go, • • • , Gd-i] orthogonal, where Q d G R NxN 

[A°] D = [A] D , [G°]d = I and b° = [l,N + l] 
For d = to D - 1 do 
for n fc = 1 to iV£ do 

s = b d nb : b d b+l — 1 (slice index) 

[KAd-c, [Q s ,sh-d,b d+i = eighjQA^) 
[Q s>s ] D = qm([& )S ] D ^ d ,D) 

end 

[A d + 1 ] D = [A d ] d +[A] D - d T d 

[Q d+i }D=m D [Q}D 

end 

Algorithm 6: This algorithm computes [A]d, [Q]d = e igh([A]o) as described in Theorem[3] The algorithm 
uses internally Algorithm [5] and 0] 
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8 Numerical tests and examples 



8.1 Taylor polynomial arithmetic on real symmetric eigenvalue problem 



As an example to test the validity of the pushforward in UTP arithmetic we consider the following system 
IIAT981 : 



2(0 



A(0 



1 

7! 



/ cos(x(f)) 
- sin(jc(f)) 
1 

-1 



1 



sin(.x(?)) 
cos(x(?)) 

1 

1 



cos(x(?)) 
sin(x(f)) J 



diag(x 2 



-1 

— sin(x(7)) 
cos 

1 3 

2 3x,d(--x 3 +2x 2 --x+l) + (x 3 +x 2 - 1),3jc — 1) 



where x = x(t) := 1 + ?. The constant 8 is some predefined constant. In Taylor arithmetic one obtains 



A = diag(l/2,l,l + 5,2) 

Ai = diag(l,5,5 + 5,3) 

A 2 = diag(2,8,8 + 5,0) 

A 3 = diag(0,0,6-35,0) 

A d = diag(0,0,0,0), \/d>4. 



One can see that in the case 5 = one obtains one repeated eigenvalue that splits at d = 3. We apply 
Algorithm [6] to reconstruct [A\d- The reconstructed values are denoted [A]p. The numerical results are 
shown in Fig. (18.11 ). 
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Figure 8.1: One left side one can see that error between the true X\ and reconstructed Ai is close to machine precision. 
On the right side one can see that the absolute error \%i — A2I has a jump at 8 ~ 10~ 7 . This is due to the fact that that 
the algorithm tteats eigenvalues |A,- — Xj\ < 10~ 7 as repeated eigenvalues. One can see that when 8 approaches 10 -16 
the error gets smaller. The eigenvalue A4 shows the same qualitative behavior as Ai and A3 the same as A2. 
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8.2 Covariance matrix computation 



To test the validity of the covariance matrix computation of Eqn. (12.11 ) we first check that the first directional 
derivatives of the covariance matrix C w.r.t. J\ and J2 coincide with the results from the complex-step deriva- 
tive approximation, abbreviated here for convenience as CSDA. The CSDA computes directional derivatives 
of a real- valued function y = f(x) as y ^if}£±BB) _ f(x+i^c}-l{x--ie£) ^ • e ^ ^ extracts the imaginary part 
and i = \/^T. The number e € M. can be made very small. For a detailed discussion that also shows the 
relation to AD see Martins et al. HMSA0U IMSA03L Having verified the first order derivatives by UTP 
arithmetic we can check if the UTP arithmetic on Eqn. (12.21) yields the same result. Unfortunately, it is not 
possible to use the CSDA in a straight-forward fashion since for complex matrices the QR decomposition 
does not yield an orthogonal but a unitary Q. For reproducibility we define J\ and J2 rather arbitrarily as 



/ sin(xi)x2 cos(x 2 ) \ 

exp(xi) x\X2 

xilog(x 2 ) log(l+exp(cos(xi))) 

y X2+X1 Xi(x2 + COs(xi) J 



h{xf 



x\ log [x% + 3 sin (xi^2 ) ) 
X2 exp(sin(xi) + cos(xiX2)) 



The numerical results are shown in Figure Km Note that x\ and X2 are here elements of the vector x and not 
coefficients. 
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Figure 8.2: This plot shows the maximum absolute differences of the directional derivatives at x = f (3, 1) T , where 
t € [0.1, 1] in direction i — (5,7) T . The circles show the difference between the CSDA solution and the first order UTP 
solution using Eqn. (12. U . The diamonds show the difference between the UTP solution of Eqn. ( 12. U and Eqn. (12.21 i. 
One can see that the difference is close to machine precision of IEEE 754 float64, which is approximately lCU 16 . 



9 Summary and outlook 



We have shown how computer codes containing real symmetric eigenvalue decompositions and QR decom- 
positions can be evaluated in univariate Taylor polynomial arithmetic. Furthermore, the reverse mode of AD 
has been treated. Explicit algorithms have been presented that can be used in combination with existing AD 
software, e.g. general purpose AD tools like ADOL-C llGJM+991 or CppAD MBellOl but also differentiated 
DAE solvers like SolvIND [AKJ. Numerical tests have been used to check the algorithms. 
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Other algorithms that contain the the QR decomposition and the real symmetric eigenvalue decompostion 
can be differentiated using the shown algorithms. In particular, we think of the differentiation of the SVD 
or the computation of pseudoinverses. We believe that these algorithms, in modified form, may also be 
valuable for the eigenvalue optimization problem where eigenvalues are repeated in the solution point. 
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A Additional proofs 

A.l Proofs of QR decomposition 
A.l.l Proof of Proposition Q] 

Proof. We look at the first defining equation and try to separate the known from the unknown quantities: 







D+E 



[Q]d+e[R]d+e-[A]d+e 



D+E 



([Q}d + [AQ]eT d )([R}d + [AR}eT d ) - [A] D+E 
[Q] D [R] D - [A} D+E + ( [AQ] e [R]d + [Q]d [AR]e)T' 
- [AF} E T D + ( [AQ] e [R]d + [Q]d [AR]e)T d 
-[AF} e + [AQ] e [R] e + [Q] e [AR] e . 



D+E 



D 



D+E 



E 



(A.l) 



Similarly for the second defining equation 







D+E 



[Q] T d+e[Q]d+e-1 

[Q] T d [Q]d - 1+ ([Q] T D [AQ] E + [AQ] t e [Q}d)T 
-[AG] E + [Q]l[AQ] E +[AQ]T[Q] E 
-[AG] E + [S] E + [X] E + [S} E -[X] E 

\[AG\ E , 



D+E 



D 



^0 



E 



E 



=> s 



A ADDITIONAL PROOFS 



where [S] E + [X]e = [Q\e[AQ]e an d it has been used that every matrix can be written as the sum of a 
symmetric and an antisymmetric matrix. Now multiply (1A.1I) by [£?]| from the left to obtain 

= -[Q] t e [AF]e + [Q} T e[AQ}e[R}e + [AR}e (A.2) 
■[Q] t e [AF] e + {[S] E + [X]e)[R]e + [AR]e (A.3) 
■[Q] t e [AF]e + [S] e [R]e + [X] e [R]e + [AR] £ . 

Multiplication of [R-n,i\e fr° m the right yields 

= -[Q] T e [AF]e[R: N ,]e 1 + [S]e[R]e[R:N,-]e 1 + PMifcli 1 + [AR]e[R:N,] E 1 
= -[Q]1[AF]e[R;n-]e 1 + [5 : ,:Ar] J B + [X. A r]£ + [Ai?] £ [/? :iV .] E 

^Pl°([X:, N ]e) = PL°([Q} T E[AF]E[R; N ,] E i -[S;,N]E) • 

The coefficients of X^+i: are not specified and can for instance be set to zero. Since X is antisymmetric it 
is already defined by the above equation. Since [5]^ + [X]£=[2]|[A2] E one can obtain [AQ] E as 

[AQ] E = [Q]e([S}e + [X] e ) 

because for quadratic Q one has the identity QQ T = 1. □ 

A.1.2 Proof of Proposition |2] 

Proof. We differentiate the implicit system 



= A-QR 
= Q T Q-] 
= P L oR 



and obtain 



= AA-AQR-QAR (*) 
= dQ T Q + Q T dQ (**). 



We define the antisymmetric "matrix" X := Q T dQ. Multiplication of Eqn. (*) from the left with Q T yields 

= Q T dA-XR-dR 
hence dR = Q T dA-XR. 



The multipication of this last equation from the right with the Moore-Penrose pseudoinverse R + = (R : n, : 1 , 0) 
yields the equivalent equation 

= Q T dAR + - XRR + - dRR + 
and thus P L o X = P L o( Q T dAR + ) , 
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A. 1 Proofs of QR decomposition 



where we have chosen arbitrarily that X- N+ i- = 0. Since X is antisymmetric we have 

X = (P L oX)-(P L oX) T . 

We can use these results to compute the pullback: 

tr (R T dR) + tr (Q T dQ) = tr (QRdA T ) - tr (RR T X) + tr {Q T QQ T dQ) 
= tr(QRdA T )+tr({Q T Q-RR T )X) 

S „ ' 

=:K 

= tr (QRdA T ) + tr ( (K - K T ) (P L o X) ) 

= tr {QRdA T ) + tr (R +T dA T Q(P L o (K T - K))) 

= tr (Q[R + {P L o(Q T Q- Q T Q + RR T - RR T )}R +T }dA 7 

= tr(AdA r ) . 

In the above derivation we have used Lemmas |7J [8] and |9l 



A.1.3 Proof of Lemma|5] 

Proof. 

D ^ E ([Q]d + [AQ}eT d ) t ([Q] d + [AQ}eT d ) 



D+E 
E 



{{Q]1[Q]d - 1) + (IQ] t d [AQ]e + [AQ] T e[Q]d)t d 
[AG]e + [Q] T e[AQ]e + [AQ} T e [Q]e 



[AG]e + 2[S} e 

4" 



[AQ] E = -\[Q}e[AG}e 



□ 



where [AQ) T E [Q] E = [S] E + [X] E , [S] E symmetric and [X] E antisymmetric and [AG E ] E T D °= E (Q T Q-l) . 
Since no condition defines constraints on [X] E it has been set to zero. □ 



A.1.4 Proof of Proposition |6] 

Proof. We want to compute tr (A T dA) = tr (A T dA) + tr (Q T dQ). We differentiate the implicit system 

= Q T AQ - A 
= Q T Q- 1 
= (P L + P R )oA 

and obtain 

dA = Q T dAQ + dQ T AQ + Q T AdQ 
= Q T dAQ + dQ T QA + AQ T dQ 
= dQ T Q + Q T dQ. 
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A ADDITIONAL PROOFS 



A straight forward calculation shows: 



tr(A r dA) = tr(£A0 T dA)+te(AAd£ T 0)+tr(AA£ T d£) 

= tr(eAe T dA), 

tr(<2 r d<2) = tr(Q T Q(Ho(Q T dAQ))) 

= tv(Q(H T o(Q T Q))Q T dA), 

tr(A r dA) = tr((Q(A + H T o(Q T Q))Q T )dA) 



where we have used 



dA = Q T dAQ-(Q T dQ) T A + AQ T dQ 
= Q T AAQ-Ko(Q T AQ) 
==> Q T &Q = Ho(Q T dAQ-dA) 
= Ho(Q T dAQ) 

where we have defined := Ajj — An and = (Ktj)~ l for i / j and Hij = otherwise and used the 
property XA - AX = K o X for all X G R Nx N and diagonal AgR NxN . □ 

A.2 Basic results used in the proofs 

Lemma 7. Let X G M A ' xA ' fee ara antisymmetric matrix, i.e., X T = —X and Pl defined as above. We then can 
write 

X = P L oX-(P L oX) T . (A.4) 

Proof. X = P L oX + P R oX = P L oX + (P L oX T ) T =P L oX-(P L oX) T □ 
Lemma 8. Let A £ R NxN and P L resp. Pr defined as above. Then 

(P L oA) T = P R oA T . (A.5) 

Proof. B u := (P L oA) u = A u (i > j) and Bj. = B n = Aji{j > i) = AjjP R = P r oA □ 
Lemma 9. LetA,B,C G R MxN . We then have 

tx(A T (BoC)) = tr(C r (BoA)) (A.6) 

Proof. tr(A r (BoC)) =lf =1 lf =1 A ;7 B i7 C i7 = tr(C T (BoA)) □ 
Lemma 10. Let A,B be lower triangular matrices. Then the following expression holds: 

P D o(AB) = (P D oA)(P D oB) . (A.7) 

Proof. AB is also lower trinangular and thus Pr> o (AB) = diag(aubu) = diag(au)diag(bu) = (Pd°A)(Pd o 
B) □ 
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Lemma 11. The formula 

P D o(A T ) = P D oA (A.8) 

holds for all quadratic matrices A. 

Proof {P D o (A T ))ij = StjAji = SijAij = (P d oA) t j □ 
Lemma 12. Let A be a nonsingular quadratic lower triangular matrix. Then the formula 

P D o(A- 1 ) = {PdoA)- 1 (A.9) 

holds. 

Proof. Using Lemma [101 we obtain (Pp o (A~ 1 ))(Pd oA) = Pd ° I = I. Since the quadratic matrices form a 
group, the inverse is unique. Therefore, equality between (Pd° {AT )) = (Pd° (A)) must hold. □ 

Lemma 13. Let A 6 M. NxN fte strictly lower triangular and B G M. NxN lower triangular. Then their product 
C = AB is strictly lower triangular. 

Proof. C is lower triangular and the diagonal entries are C,-, = AaBu = since B has a zero diagonal. □ 

Corollary 14. Let A € M. NxN be strictly lower triangular and D £ W NxN diagonal. Then their product 
C = AD is strictly lower triagonal. 

Lemma 15. Every quadratic matrix A can be written as the sum of a symmetric matrix S = I (A +A T ) and 
an antisymmetric matrix X = ~(A — A T ), i.e. 

A = S + X (A.10) 
Proof. A = \{A+A T +A-A T ) = \{A+A T ) + \{A-A T ) = S + X. □ 
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